HYPERGEOM
Overview
The HYPERGEOM function computes values from the hypergeometric distribution, a discrete probability distribution that models the number of successes when drawing objects without replacement from a finite population. This distribution is fundamental in statistical sampling, quality control, and hypothesis testing scenarios where sampling is performed without replacement.
The hypergeometric distribution arises when drawing N objects from a population of M total objects, where n are classified as “Type I” (successes). Unlike the binomial distribution, which assumes independent trials with replacement, the hypergeometric distribution accounts for the changing probability as items are removed from the population. For more details, see the SciPy hypergeom documentation.
The probability mass function (PMF) calculates the exact probability of observing k Type I objects in the sample:
P(X = k) = \frac{\binom{n}{k} \binom{M-n}{N-k}}{\binom{M}{N}}
where \binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient, M is the total population size, n is the number of success states in the population, and N is the number of draws.
This implementation uses SciPy’s scipy.stats.hypergeom, which leverages the Boost C++ Math library for high-precision computations. The function supports multiple calculation modes: PMF for point probabilities, CDF and SF (survival function) for cumulative probabilities, ICDF (inverse CDF) and ISF (inverse survival function) for quantile calculations, and direct computation of descriptive statistics (mean, variance, standard deviation, and median).
Common applications include calculating probabilities in lottery scenarios, determining quality control sampling outcomes, analyzing Fisher’s exact test for contingency tables, and modeling capture-recapture experiments in ecology.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=HYPERGEOM(k, m, n, draws, hypergeom_mode, loc)
k(list[list], required): Value at which to evaluate (for PMF/CDF/SF) or probability (for ICDF/ISF). Ignored for statistics modes.m(int, required): Total number of objects in the population (must be >= 0).n(int, required): Number of Type I objects in the population (must be >= 0 and <= m).draws(int, required): Number of draws from the population (must be >= 0 and <= m).hypergeom_mode(str, optional, default: “pmf”): Output type for the hypergeometric distribution calculation.loc(float, optional, default: 0): Location parameter that shifts the distribution.
Returns (float): Distribution result (float), or error message string.
Examples
Example 1: PMF at k=3
Inputs:
| k | m | n | draws | hypergeom_mode |
|---|---|---|---|---|
| 3 | 20 | 7 | 12 | pmf |
Excel formula:
=HYPERGEOM(3, 20, 7, 12, "pmf")
Expected output:
| Result |
|---|
| 0.1987 |
Example 2: CDF at k=3
Inputs:
| k | m | n | draws | hypergeom_mode |
|---|---|---|---|---|
| 3 | 20 | 7 | 12 | cdf |
Excel formula:
=HYPERGEOM(3, 20, 7, 12, "cdf")
Expected output:
| Result |
|---|
| 0.2508 |
Example 3: Survival function at k=3
Inputs:
| k | m | n | draws | hypergeom_mode |
|---|---|---|---|---|
| 3 | 20 | 7 | 12 | sf |
Excel formula:
=HYPERGEOM(3, 20, 7, 12, "sf")
Expected output:
| Result |
|---|
| 0.7492 |
Example 4: Inverse CDF at p=0.5
Inputs:
| k | m | n | draws | hypergeom_mode |
|---|---|---|---|---|
| 0.5 | 20 | 7 | 12 | icdf |
Excel formula:
=HYPERGEOM(0.5, 20, 7, 12, "icdf")
Expected output:
| Result |
|---|
| 4 |
Example 5: Vector PMF input
Inputs:
| k | m | n | draws | hypergeom_mode | |
|---|---|---|---|---|---|
| 2 | 3 | 20 | 7 | 12 | pmf |
Excel formula:
=HYPERGEOM({2,3}, 20, 7, 12, "pmf")
Expected output:
| Result | |
|---|---|
| 0.0521 | 0.1987 |
Python Code
from scipy.stats import hypergeom as scipy_hypergeom
def hypergeom(k, m, n, draws, hypergeom_mode='pmf', loc=0):
"""
Compute Hypergeometric distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.hypergeom.html
This example function is provided as-is without any representation of accuracy.
Args:
k (list[list]): Value at which to evaluate (for PMF/CDF/SF) or probability (for ICDF/ISF). Ignored for statistics modes.
m (int): Total number of objects in the population (must be >= 0).
n (int): Number of Type I objects in the population (must be >= 0 and <= m).
draws (int): Number of draws from the population (must be >= 0 and <= m).
hypergeom_mode (str, optional): Output type for the hypergeometric distribution calculation. Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Variance, Std Dev, Median. Default is 'pmf'.
loc (float, optional): Location parameter that shifts the distribution. Default is 0.
Returns:
float: Distribution result (float), or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
# Validate m, n, draws
try:
m_val = int(m)
n_val = int(n)
draws_val = int(draws)
except (ValueError, TypeError):
return "Invalid input: m, n, draws must be integers."
if m_val < 0 or n_val < 0 or draws_val < 0:
return "Invalid input: m, n, draws must be >= 0."
if n_val > m_val:
return "Invalid input: n must be <= m."
if draws_val > m_val:
return "Invalid input: draws must be <= m."
# Validate loc
try:
loc_val = float(loc)
except (ValueError, TypeError):
return "Invalid input: loc must be a number."
# Validate hypergeom_mode
valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
if not isinstance(hypergeom_mode, str) or hypergeom_mode not in valid_modes:
return f"Invalid input: hypergeom_mode must be one of {sorted(valid_modes)}."
# Handle statistics (k is ignored)
if hypergeom_mode == "mean":
return float(scipy_hypergeom.mean(m_val, n_val, draws_val, loc=loc_val))
if hypergeom_mode == "var":
return float(scipy_hypergeom.var(m_val, n_val, draws_val, loc=loc_val))
if hypergeom_mode == "std":
return float(scipy_hypergeom.std(m_val, n_val, draws_val, loc=loc_val))
if hypergeom_mode == "median":
return float(scipy_hypergeom.median(m_val, n_val, draws_val, loc=loc_val))
# PMF, CDF, SF, ICDF, ISF
def compute(val):
try:
kval = float(val)
except (ValueError, TypeError):
return "Invalid input: k must be a number."
if hypergeom_mode == "pmf":
return float(scipy_hypergeom.pmf(kval, m_val, n_val, draws_val, loc=loc_val))
elif hypergeom_mode == "cdf":
return float(scipy_hypergeom.cdf(kval, m_val, n_val, draws_val, loc=loc_val))
elif hypergeom_mode == "sf":
return float(scipy_hypergeom.sf(kval, m_val, n_val, draws_val, loc=loc_val))
elif hypergeom_mode == "icdf":
if not (0 <= kval <= 1):
return "Invalid input: probability must be between 0 and 1 for icdf."
return float(scipy_hypergeom.ppf(kval, m_val, n_val, draws_val, loc=loc_val))
elif hypergeom_mode == "isf":
if not (0 <= kval <= 1):
return "Invalid input: probability must be between 0 and 1 for isf."
return float(scipy_hypergeom.isf(kval, m_val, n_val, draws_val, loc=loc_val))
return "Unknown mode"
# Process k
k_list = to2d(k)
if not isinstance(k_list, list) or not all(isinstance(row, list) for row in k_list):
return "Invalid input: k must be a scalar or 2D list."
result = []
for row in k_list:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str):
return out
result_row.append(out)
result.append(result_row)
# Return scalar if input was scalar
if not isinstance(k, list) and len(result) == 1 and len(result[0]) == 1:
return result[0][0]
return result